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ABSTRACT 





Decision making is an important element throughout the life-cycle of large-scale projects. Decisions are 
critical as they have a direct impact upon the success/outcome of a project and are affected by many fac- 
tors including the certainty and precision of information. In this paper we present an evidential reasoning 
framework which applies Dempster-Shafer Theory and its variant Dezert-Smarandache Theory to aid 
decision makers in making decisions where the knowledge available may be imprecise, conflicting and 
uncertain. This conceptual framework is novel as natural language based information extraction tech- 
niques are utilized in the extraction and estimation of beliefs from diverse textual information sources, 
rather than assuming these estimations as already given. Furthermore we describe an algorithm to define 
a set of maximal consistent subsets before fusion occurs in the reasoning framework. This is important as 
inconsistencies between subsets may produce results which are incorrect/adverse in the decision making 
process. The proposed framework can be applied to problems involving material selection and a Use Case 
based in the Engineering domain is presented to illustrate the approach. 


Information fusion 
Knowledge and data engineering 


1. Introduction 


Decision making in large-scale projects are often sophisticated 
and complex processes the choices of which have an impact on di- 
verse stages of the project life-cycle. Decision making in such com- 
plex projects involves the evaluation of multiple design decision 
options against criteria such as detailed requirement specifications 
and Industry Standards. Evidence supporting/opposing decisions 
can be extracted from diverse heterogeneous information sources 
including: trade studies, Pugh matrices and expert discussions. 
However, these evidence sources vary in terms of reliability, com- 
pleteness, precision and may contain conflicting information. Fur- 
thermore, the tracking and modeling of these evidence and 
rationale is currently lacking in the decision making process mak- 
ing it challenging for decision makers to make critical decisions. 

The research proposed in this paper outlines a novel design 
framework whereby evidence is extracted based on Natural Lan- 
guage Processing (NLP) techniques. The retrieved evidence will 
form the basis of an evidential reasoning system to aid decision 
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makers in the decision making process. This framework is applica- 
ble to any problem which is based on a set of alternatives where 
the support for any given alternative can be expressed in proposi- 
tional logic based on the presence of various attributes. To illus- 
trate the application of our proposed conceptual framework we 
use a Use Case based in the Aerospace domain. This Use Case dem- 
onstrates the methodology proposed in the paper. The design deci- 
sions or hypotheses in this framework are related to acommon one 
in manufacturing design which is the best choice of a material in 
the design of a component. We adopt a simple list of choices to 
aid in the exposition of our process. Other examples of problems 
which can also be described in terms of a choice of a given alterna- 
tive based on attributes include the choice of a particular product 
design |1] or the choice of best performing motorcycle [2]. 

The application of this novel framework integrates diverse re- 
search areas including NLP based information extraction and evi- 
dential reasoning. This framework will extract evidence from 
unstructured information sources which vary in terms of their 
quality and reliability and are combined using evidential reasoning 
techniques. This approach diverges considerably from evidential 
reasoning methods in multi-attribute decision making (MADM) 
which are described in [3,4,2]. The starting point in such work is 
that the relevant attributes or properties and their quantitative 
values are fully specified. We are making no such assumption 
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and rather our starting point for collating evidence is a number of 
unstructured discursive documents which make qualitative state- 
ments which may have varying levels of relevance to a specific de- 
sign problem. 

The focus of information extraction identifies key sentences in 
documents and makes a determination whether any key sen- 
tences in a document is judged to entail any proposition in a 
knowledge base. Textual entailment is a relatively new area of 
research in NLP and is based on determining whether the 
truth/validity of one piece of text, is entailed (can be inferred) 
from another (often larger) snippet of text [5]. This entailment 
relation is usually based on human judgment and may not al- 
ways be possible to derive from logical inferencing [6]. There 
are many approaches to the entailment task and the use of var- 
ious NLP components have including various similarity based 
measures, anaphora resolution, paraphrasing, syntactic graph 
alignment, named entity recognition, semantic parsing and logi- 
cal inference based on model theoretic approaches. These en- 
tailed propositions determine which expert defined rules are 
applicable. 

A number of techniques including Bayesian belief networks, 
fuzzy logic, rough set theory and evidence theories have been ap- 
plied to handle imprecise and uncertain information | 1,7—9]. Evi- 
dence theories provide important reasoning mechanisms in 
Artificial Intelligence and Information Fusion. These theories have 
been successfully applied to diverse problem areas to solve a vari- 
ety of problems with imprecise and incomplete information [10]. 
For example, theories of evidence including Dempster-Shafer The- 
ory (DS) [11] and Dezert-Smarandache Theory (DSm) [12] have 
previously been applied in the domain of Aerospace to handle 
uncertainty when fusing sources of information for decision mak- 
ing purposes. Such areas have involved sensor Information Fusion 
[13] and target identification |12] where systems are required to 
deal with imprecise information and conflicts which may arise 
among sensors. A study in [14] provides an example of how argu- 
mentation and reasoning can be applied to handle uncertainty and 
conflicts in decision making. For both DS and DSm theory, mass 
functions representing belief are the main concepts that are ap- 
plied to carry out uncertainty reasoning [10]. In this research we 
propose to fuse imprecise and potentially conflicting information 
sources using an evidential reasoning framework based on DS 
and DSm theory. To ensure consistency exists between these basic 
belief assignments (bbas) before the fusion process we incorporate 
the process of constructing maximal consistent subsets based on 
a proposed algorithm. This methodology is required as integration 
of conflicting sources can result in incorrect decision predictions. 
The metric distance measure |15] is applied to measure the simi- 
larity between the subsets. Subsets which are not considered as 
part of the maximal consistent subset are subjected to discounting 
based on the reliability discounting using Shafer’s classical dis- 
counting approach described in | 16]. The evidential reasoning pro- 
cess of maximally consistent subsets was initially proposed in [17], 
which solely concentrated on evidential reasoning in the presence 
of available evidence. 

The rest of paper is presented as follows. We firstly describe the 
information extraction techniques applied in the framework to de- 
fine the bbas. The evidential reasoning processes is then described 
in Section 3 which applies the knowledge and rules extracted from 
the Information Fusion phase. An overview of the proposed Meth- 
odology is provided in Section 4 which details more the implemen- 
tation issues concerning information extraction and evidential 
reasoning. To illustrate the application the proposed framework a 
Use Case is presented in Section 5. We discuss the findings related 
to the Use Case in Section 6 and areas for improvement. Finally the 
key conclusions and contributions of this study are described in 
Section 7. 


2. Collation of evidence 


The processes of information extraction for the collation of evi- 
dence depends on the use of a knowledge base and a trained entail- 
ment model. A key assumption of this paper and illustrated in the 
Use Case is the ability of a designer to specify a knowledge base 
without uncertainty which lists propositional qualitative state- 
ments describing a material and a desirable or undesirable prop- 
erty. In addition the knowledge base contains simple inferential 
rules where each rule based on a propositional logic implication 
linking a design decision as a material choice as head of a rule 
and a body based on a material and associated desirable or unde- 
sirable properties. It is natural to assume that a designer would 
be able to specify a list of such propositions and simple rules, how- 
ever it not the case that such information will always be applicable 
in analyzing a document and its level of support will vary per doc- 
ument. Each document may trigger a varying number of rules or 
none at all, in support of one or more hypotheses and only by using 
evidential reasoning can we make an informed decision as to most 
appropriate material choice. In effect we are assuming the pres- 
ence of background knowledge and to allow for the fact that sup- 
port for or against a given proposition may be written in a 
number of ways and we cannot expect the designer to specify all 
possible propositions, we apply a textual entailment model [18] 
to check whether extracted sentences of interest entail any of 
propositions in our knowledge base, to aid in the process of detect- 
ing supported propositions. The process of entailment is based on 
the principle that a reader can infer from one sentence (referred to 
as the text) that another sentence, (referred to as the hypothesis) is 
true. A closely related mechanism to entailment is paraphrasing 
however whereas paraphrasing allows for a symmetric entailment 
relationship, in standard entailment the hypothesis may not be 
shown to entail the text | 19]. Given a textual source of information, 
we wish to extract key sentences from the source where a key sen- 
tence contains a reference to a material and one or more proper- 
ties. For each key sentence related to a material, a check is made, 
based on an application of the entailment model whether any 
proposition in the knowledge base related to the same material 
is shown to be true. For a list of true propositions, we check 
whether any of the rules fires. The output from this step is an in- 
dexed list of the rules that fire, where each rule supports a possible 
material selection. Note that the designer would not be able to 
manually derive an overall design decision as to a material choice 
and depends upon evidential reasoning mechanisms to provide 
scientific support for an overall decision. 


2.1. Knowledge base 


The knowledge base is divided into four sections: materials, 
properties, propositions and rules. We consider three materials 
for the purpose of this study: Aluminum, Titanium and Composite. 
This limited choice of materials aids in the exposition of the pro- 
cess. The constant values for these material as used by propositions 
and rules are denoted by Al, Ti and Comp respectively. Properties 
consist of a 3 letter abbreviation denoting the property and a short 
description of its meaning. Only known materials and properties 
may be referenced in propositions and rules. Propositions are in 
the form: 


(Material) (Property) (Statement) 


where (Statement) is an example of a textual proposition which re- 
lates a property to a material. (Material) is set to one of the material 
constants and (Property) is one of the possible properties as denoted 
by its abbreviation. The use of the property in a proposition may be 
negated. 
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Rules are defined in Backus-Naur form [20] : 


(Material) & (Formula) = (Material) 

(Formula) =~ (Formula) | 

((Formula) &(Formula)) | (1) 
((Formula) || (Formula) ) | 

Property 


where ~ denotes negation, || denotes logical or and & logical and. In 
effect, a rule body indicates a material and logical formula related to 
the properties for that material and the head of the rule is the rec- 
ommendation of a material as the material choice. 

Part of the knowledge base is shown in Fig. 1, where only cer- 
tain properties, propositions and rules are listed Note that the 
property Dam (indicating the detection of damage), leads to a rule 
whereby its negation in combination with Comp implies either 
that Al or Ti should be selected, reflecting the problem of detecting 
damage in composites. 


2.2. Sentence detection 


The process of sentence extraction is based on the use of spe- 
cialized gazetteer lists for materials and properties. We extract 
only sentences which contain at least one reference to a material 
and one to a property. In addition we also consider sentences that 


@materials 

Al Aluminium 
Ti Titanium 
Comp Composite 


Q@properties 
Cor Corrosion resistance 
Cos Cost 


Dam Damage easily detected 


Rat Strength to lightness ratio 
Ref Resistant to fatigue 

Saf Safety issues 

Wei Lighter in Weight 


contain a pronoun that refers to a material in a previous sentence 
as detected by anaphora resolution and a reference to a property. 


2.3. Entailment checking 


For each sentence in a list of extracted sentences, we check 
whether the sentence (S) entails any proposition P from the knowl- 
edge base which shares a reference to a particular material. The 
process of entailment is based on deriving a number of similar- 
ity/dissimilarity features from the entailment pair ((S, P)) similar 
to those considered in [21] which extracts a number of match 
and mismatch features in order to build a classification model. 
We chose this approach as the extracted features do not require 
computational intensive methods to be derived and as a machine 
learning method it allows us to consider various classification 
methods. The features are summarized in Table 1. In general, fea- 
tures are based on discovering overlap counts for matching ele- 
ments in the text and hypothesis pair. “Lexical” features include 
the following features: stopwords in common (in absolute and nor- 
malized form), content words in common (in absolute and normal- 
ized form), all words in common (in absolute and normalized 
form). “Related words” are based on discovering synonym, causal 
and entailment relations overlap counts based on WordNet. The 
latter features are based in absolute and normalized form. “Rela- 
tions” are based on discovering skip bigrams and grammatical 


Titanium has best strength to weight ratio among the metals 


Q@propositions 

Al Cor Aluminium has good corrosion resistance 

Ti Cos Initial high cost in using titanium 

Ti Cor ‘Titanium provides excellent corrosion resistance 
Ti Rat 

Ti Wei ‘Titanium provides weight savings 

@rules 


Al & Dam => Al 


Comp & ~Dam=> Al 
Comp & ~Dam=> Ti 


Ti & Cor > Ti 
Ti & Cos = Comp 
Ti & Rat > Ti 
Ti & Wei => Ti 


Fig. 1. Sample elements from the knowledge base. 
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Table 1 
A description of the feature types used for entailment. 
Feature Number of Description 
type features 
Lexical 10 Lexical overlap 
Related 6 Related words derived though WordNet 
words 
Relations 6 Shared relations derived through 


dependency parsing 


MisMatch 3 Features based on negation, antonyms 


relations overlaps. The skip bigrams include a normalized count of 
skip bigrams matches using all words; a similar normalized count 
of skip bigrams created using only nouns and verbs. Grammatical 
relation are based on dependency parsing and includes features 
based on the full relation and a dependency pair of terms excluding 
the dependency. The previous features are given in absolute and 
normalized form.“Mismatch” features include a normalized count 
of negated verbs that appear only in the hypothesis and not in 
the text, the number of antonym pairs in the text hypothesis pair 
and a normalized value for the latter feature. 

For a given source [22] the classification model discovers the 
following entailments (where = denotes an entailment relation) 
between sentences in the source and propositions in the knowl- 
edge base: 

The chemical industry is the largest user of titanium due to its 
excellent corrosion resistance = Titanium provides excellent corrosion 
resistance 

The primary attributes that make titanium an attractive material 
include an excellent strength-to-weight ratio, providing weight sav- 
ings F Titanium has best strength to weight ratio among the metals 

The high strength and low density of titanium (40% lower than that 
of steel) provide many opportunities for weight savings =} Titanium 
provides weight savings 

This led to 3 rules being fired for the given source as shown in 
Table 3. 


3. Evidential reasoning 


An evidential reasoning framework based on DS and DSm the- 
ory is proposed to fuse information sources to aid in the decision 
making process. For a given information source, the applicability 
of propositional rules and information associated with information 
sources allows us to derive and estimate bbas outlined in the tex- 
tual analytics steps above. Before fusion of these bbas occur we ap- 
ply a pre-processing step to ensure consistency exists between 
bbas thereby obtaining the maximal consistent subset. This is 
important as imprecise and highly conflicting information can have 
a detrimental impact upon the fusion process. This section pro- 
vides an overview of the Theory of belief functions, distance mea- 
sures, evidential operators and discounting techniques some of 
which are applied in the proposed framework. 


3.1. Theory of belief functions 


The DS (evidential theory) is a generalization of traditional 
probability. This theory provides a mathematical formalism to 
model our belief and uncertainty on possible decision options for 
a given decision making process. The application of the Demp- 
ster—-Shafer rule of combination of belief functions has been advan- 
tageous in the fusing of uncertain evidence supporting different 
hypotheses [5]. However, when conflict between sources becomes 
high the DS can generate errors in decision making. To address this 
problem we use the DSm which can be considered a generalization 


of DS. DSm overcomes limitations of DS by proposing new models 
for the frame of discernment and new rules of combination that 
take into account both paradoxical and uncertain information. A 
review of DS and DSm theory is presented below. 


3.1.1. DS theory 

In DS the frame of discernment (FOD) denoted by © = {6;, 
...,9n} contains a finite set of n exclusive and exhaustive hypothe- 
ses. The set of subsets of © is denoted by the power set 2®. For in- 
stance, {Al, Comp, Ti} is the frame for materials (Aluminum, 
Composite, Titanium) from which an engineer selects one to con- 
struct a component. 


3.1.2. DSm 

DSm proposes new models for the frame of discernment and 
new rules of combination that take into account both paradoxical 
and uncertain information. In DSm, the free DSm model, © = 
{01,...,9n} is assumed to be exhaustive but not necessarily exclu- 
sive due to the intrinsic nature of its elements, the set of subsets 
are denoted by the hyper power-set D? (Dedekind’s lattice) de- 
scribed in detail in [23] which is created with U and N operators. 
Using the hybrid DSm (hDSm) model integrity constraints can be 
set on elements of © reducing cardinality and computation time 
compared to the free model. When Shafer’s model holds i.e. all 
exclusivity constraints on elements are included the D? reduces 
to the power set 2°. We denote G® the general set on which will 
be defined the basic belief assignments, i.e. G? = 2° when DS is 
adopted or G? = D? when DSm is preferred depending on the nat- 
ure of the problem. A normalized basic belief assignment (bba) or 
mass function expressing belief assigned to the elements of G? pro- 
vided by an evidential source is a mapping function m: G? = [0,1] 
representing the distribution of belief satisfying the conditions: 


m(0) =O X m(A)=1 (2) 


AcG? 


In general the condition m(@) = 0 need not hold for a basic belief 
assignment [24]. As basic belief assignments in the paper are al- 
ways normalized we do not make any further distinction in the pa- 
per and our reference to bbas are always normalized. In evidence 
theory, a probability range is used to represent uncertainty. The 
lower bounds of this probability is called Belief(Bel) and the upper 
bounds Plausibility(Pl). The generalized Bel and the PI for any 
proposition A € G? can be obtained by: 


Bel(A)= X` m(B) (3) 
BCA 
B € G? 

PI(A) = m(B) (4) 
Bn AO 
Bec” 


3.1.3. Rules of combination 

In DS, Dempster’s rule of combination is symbolized by the 
operator © and used to fuse two distinct sources of evidence B; 
and B» over the same frame ©. Let Bel, and Belz represent two be- 
lief functions over the same frame 2? and m; and m, their respec- 
tive bbas. The combined belief function Bel = Bel, 6 Belz is obtained 
by the combination of m; and m» as: m(@)=0 and VC 40 c O 


> anp=clM (A)m2 (B) 
1 — > anp—ols (A) (B) 


Dempster’s rule of combination is associative ([m; 6 m2] 6 m3 = 
mı $ [m 6 m3]) and commutative (m; 6 m2 = M2 G mı). 


m(C) = [m, p məz] (C) = (5) 
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In DSm the Proportional Conflict Redistribution Rule no. 5 
(PCR5) has been proposed as an alternative to Dempster’s rule 
for combining highly conflicting sources of evidence. Below Demp- 
ster’s combination rule and PCR5 are briefly detailed, a complete 
presentation of DSm can be found in [23]. 


Mpcr5(A) = > mı (X1 )M2 (X2) 
X1,X2 € G? 
Xi10NX2=A 
m,(A)°m2(X) — m(A)}*m (X) 
T o [MAA mA] O 
XnNA=90 


All fractions in (6) which have a denominator of zero are discarded. 
All propositions/sets in the formula are in canonical form. PCR5 is 
commutative and not associative but quasi-associative. 


3.1.4. Probabilistic transformation 

We need to obtain pignistic probabilities for decision making 
purposes for this study. Fused beliefs are mapped to a probability 
measure using the generalized pignistic transformation approach 
DSmP |25], an alternative to the approach BetP proposed by Smets 
and Kennes [26]. DSmP is advantageous as it can be applied to all 
models (DS, DSm, hDSm). BetP is defined as BetP(@) = 0, V(X) € 2°\0 
by: 


|X Y)|-m(Y) 
YI- C- m0) 4) 





BetP(X)= X` 


Ye2° Y= 
DSmP is defined by DSmP,(0) = 0 and VX € G? by 


E CoS 
UR! 235 aa moe a (8) 
C(Z) =1 


where G® corresponds to the hyper power set; C(X n Y) and C(Y) de- 
note the DSm cardinal of the sets X N Y and Y respectively; € > Oisa 
tuning parameter which allows the value to reach the maximum 
Probabilistic Information Content (PIC) of the approximation of m 
into a subjective probability measure [25]. The PIC value is applied 
to measure distribution quality for decision-making. The PIC of a 
probability measure denoted P over a discrete finite set © = 
{01,...,0,} is defined by: 


1 n 
PIC(P) = 1 + A, Da (P{0:}) (9) 
where Hmax = log2(n) is the maximum entropy value. A PIC value of 


1 indicates the total knowledge to make a correct decision is avail- 
able whereas zero indicates the knowledge to make a correct deci- 
sion does not exist [25]. 


3.2. Estimation of basic belief assignments 


From the output of the information extraction and textual 
entailment processes, the bbas for evidence sources can be esti- 
mated. In this research, to estimate the bba values, different factors 
are discounted which are described below. 


3.2.1. Rules fired 

For each evidence source, a number of rules can be fired which 
support a hypothesis. More than one hypothesis may be supported 
by a source. In our Use Case, the hypothesis relates to a material 
choice. The greater the number of rules fired to support a particular 


hypothesis, the more confidence we have in this hypothesis. Differ- 
ent weights will be assigned based on the number of rules fired. 


3.2.2. Priority sources 

Different sources are discounted differently depending on the 
reliability of the source. In other words, different category of 
sources have different reliability weightings. 


3.2.3. Priority of the rules 

Experts have ranked the different rules with respect to their pri- 
ority. This factor is utilized as a further discounting step when we 
consider the formation of maximal consistent subsets. 

Before these bbas are fused using the evidential reasoning 
framework, a maximal consistent subset is constructed with the 
aim of reducing the errors in the fusion process caused by conflicts 
in the evidence. This novel approach is detailed in the following 
section. 


3.3. Maximal consistent subsets 


Evidence acquired from diverse heterogeneous sources are of- 
ten inconsistent and conflicting. Furthermore, these evidence differ 
in terms of reliability and priority. To reduce errors in the fusion 
process caused by conflicts in the evidence, the construction of a 
maximal consistent subset is proposed which can aid with deter- 
mining which sources should be discounted before fusion. This in- 
volves constructing a subset of sources that are consistent with 
each other. Discounting could be applied to sources deemed dis- 
similar or non-coherent. To measure the coherence between evi- 
dence sources, a evidence distance measure can be applied. 


3.4. Evidence distance measures 


Within a given problem domain, evidences obtained from vari- 
ous sources may give rise to different bbas. The distances between 
these bbas have an important effect upon the fusion of evidence in 
evidence theory. The distance between bbas can be defined to rep- 
resent dissimilarity between sources of evidence. To measure the 
distances between bbas a number of measures can be applied 
including the Metric Distance [15], Euclidean Distance [27] and the 
MaxDiff Distance proposed in [28,29]. A comprehensive review of 
distance measures can be found in [30]. In this research we apply 
the commonly used metric distance defined in [15]. 


3.4.1. Metric distance 

Let E; and E> represent evidences within a frame of discernment 
©. The corresponding mass functions are m; and m, with focal ele- 
ments A; and B;. The distance between m; and m, can be defined as: 


d(m,,m2) = (10) 





where m and m, are the bba vectors and D is a matrix with size of 
2'@! x 212| whose elements are defined by Jaccard’s indexes 

ANB 
AUB] 





D(A, B) = A,B € 2° (11) 
The similarity between m; and mə can be obtained using the dis- 
tance measure d(m,, mz) € [0,1] which takes into consideration 
both the values and specificity of the focal elements of each bba. 


3.5. Evidential operations 
Evidence to support or refute design options in a decision mak- 


ing process can be extracted from numerous information sources 
including reports, journals and magazine articles. Some sources 
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may be regarded as being reliable or having a higher priority than 
others. It is important to manage these factors in the fusion process 
to reduce errors in reporting beliefs for decision options. Prior 
knowledge is applied to estimate both the discounting values. 


3.5.1. Discounting technique 

In discounting, a discounting factor « in [0,1] can be applied to 
weight a given factor according to a certain criteria [23]. For in- 
stance, evidence extracted from an aviation journal is considered 
higher quality than a blog post. In the latter case, the factor trans- 
forms the belief of each source to reflect credibility. Shafer’s dis- 
counting technique [31] has been proposed for the combination 
of unreliable evidence sources and is used for discounting for the 
various factors such as rules fired, the priority of the source and 
the priority of rules. Incorporation of the factor «œ € [0,1] in the 
decision making process is defined as: 


te eae YX CO (12) 


m,(O) =a-m(O@) + (1 —«) 


whereby « = 0 represents a fully unreliable source and « = 1 a fully 
reliable source. The discounted mass is committed to m(@). 


4. Methodology 


An overview of the proposed methodology illustrating how the 
information extraction mechanisms in support of Evidence Colla- 
tion provide input to the evidential reasoning processes is pre- 
sented in Fig. 2. In this figure for each evidence source a number 
of information extraction comprising textual entailment steps 1- 
5 are carried out leading to an evidence file. This file is processed 
by a number of evidential reasoning steps. Depending on the appli- 
cation area, evidential sources can be extracted from a diverse 
number of resources including: journals, white papers, standards, 
online presentations and blog articles. In order to analyze the tex- 
tual information contained in these sources, each source is re- 
quired to be converted into a plain text format based on a 
manual process based of converting the file to plain text before 
the information extraction phase. 


4.1. Collation of evidence 


The process of Evidence Collation is based on the use of embed- 
ded GATE [32]. GATE Embedded is an open source object-orien- 
tated framework developed in Java to provide embedded 
language processing functionality in diverse applications. It sup- 
ports a number of processing resources such as sentence detection, 
tokenization, tagging and through a plugin Stanford dependency 
parsing | 33]. To analyze textual content GATE processing pipelines 
can be constructed. In this research each source is processed in a 
GATE pipeline whereby for a given source the text is tokenized 
and split into sentences. Gazetteer lists are used to detect whether 
a sentence contains at least one reference to a material and to a 
property (step 1). Further processing is applied to the key sen- 
tences to identify entailment features (step 2). The given sentences 
are used to detect if any proposition in the knowledge base is en- 
tailed and considered true, based on their extracted entailment 
features and a trained entailment model as described in the follow- 
ing sub-section (step 3). Based on a given hypothesis and a list of 
associated true propositions, the set of rules are checked to see if 
any rule fires (step 4). For a given source, the source name and zero 
or more material selections are written to an evidence file (step 5). 
Associated with each selection is an indexed list of rules which 
fired and supported the selection. 


4.1.1. Textual entailment 

An entailment model was developed to determine if an ex- 
tracted sentence obtained from a source entails a propositional 
sentence defined in the constructed knowledge base. The model 
was trained using a combination of the Recognizing Textual Entail- 
ment challenge, RTE2 [34] and RTE3 [35] training and test data sets 
(RTE2+3). A total of 25 entailment features were selected as de- 
scribed in Section 2.3. The entailment model was trained using 
the Learning plug-in in GATE which provides a range of different 
classifier methods including SVM (support vector machines), C4.5 
(decision tree learner), PAULM (on line perception) and KNN 
(K-nearest neighbor). We trained an entailment model using a 
combination of the Recognizing Textual Entailment challenge, 
RTE2 [34] and RTE3 [35] training and test data sets (RTE2+3). An 
initial assessment was carried out based on a 10-fold cross-valida- 
tion of the RTE2+3 for a number of available classification methods 
to assess the level of accuracy in terms of its F1 measure which is 
the harmonic mean of the precision and recall. The results of this 
assessment are shown in Table 2. The “Options” column contains 
the setting of particular classifier option values in the case that 
non-default settings were tried and a higher accuracy was obtained 
than for the default settings. 

The values reflect the level of accuracy shown by Inkpen et al. 
[21]. As C4.5 returned the highest F1 measure, we chose this clas- 
sification mechanism as the approach to create a model for all the 
training data and act as the entailment model in the pipeline. Gi- 
ven that the level of accuracy was not high we raised the threshold 
for the confidence of a positive classification from 0.5 to 0.8 to min- 
imize the number of false positives shown by entailment. 


4.2. Evidential reasoning 


4.2.1. Estimation of basic belief assignments 

Output knowledge from the Evidence Collation process in the 
form of an evidence file is used to estimate the bbas of the diverse 
sources of information (step 6). An excerpt of these outputs ob- 
tained for three different sources from the pipeline are presented 
in Table 3. It can be viewed from Table 3 that the output for each 
source contains knowledge concerning the hypothesis, the number 
of rules fired for a particular hypothesis and the origin of the 
source. Estimation of bbas is based on the information in the evi- 
dence file and the application of discounting using the classical Sha- 
fer discounting approach. A source is discounted based on the 
combination of the number of rules that were fired and the origin 
of the source. Tables 4 and 5 highlight the different discounting 
factors applied to estimate the bbas. These discounting factors val- 
ues have been estimated using expert knowledge. For example, as 
the number of rules fired for hypotheses increases, confidence in 
this hypotheses increases. Furthermore, evidence extracted from 
a journal article is considered more reliable than a blog source. 

To combine the two discounting factors the product of the 
discounting factors is calculated. For example, using Source 1 in 
Table 3 along with the discounting factors described in Table 4 
the combined discounting factor applied in the estimation of bbas 
can be determined as follows: evidence from Source 1 was 
extracted from the journal paper “Attributes, characteristics, and 
applications of titanium and its alloys [22]”. Based on the source 
origin the first discount factor is 1 as the source is a journal paper. 
From Table 3 it can be seen using Source 1 that 3 rules were fired 
for the hypothesis Titanium. Therefore the second discounting fac- 
tor is 0.75 obtained from the Table 4. The combined discounting 
factor for Source 1 is therefore 1*0.75=0.75. Using this 
discounting factor one can estimate the bba for Source 1. The pro- 
cess applied in this research to estimate bbas is described in Algo- 
rithm 1. 
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Table 2 
Cross validation accuracy of entailment classifiers based on GATE Learning. Table 5 
| Additional discounting factors applied to basic belief assignments not members of the 
Classifier F1 measure Options maximal consistent subset. 
ih ee (-c 0.7 -tau 0.4) Rule importance Discount 
PAULM 0.574 (-p 50 -n 5 -optB 0.3) 1 Highly important 1 
KNN 0.565 -k 3 2 Important 0.66 
3 Less important 0.33 
Taies Algorithm 1. Estimation of Basic Belief Assignments 


Rules fired for a selection of sources. 
STEP 1 Calculate discounting factor for each hypothesis on 


Frame of Discernment 
1 Journal [22] Titanium Ti & Cor > Ti FOREACH hypothesis 


ID Source type Source Hypothesis Rule 


E ee COUNT the number of rules fired in the textual entailment 
phase for a particular hypothesis and obtain discounting 
2 Blog [36] Aluminum Al & Cor => Al etor 
3 Web Source [37] PA k ee ee DETERMINE the source origin and obtain discounting factor 
omposite 1 os = Comp ‘ i 
AUDOIN Ti & Wei > Al COMBINE both discounting factors based on rules fired and 
source origin. 
END FOREACH 
STEP 2 Calculate discounting factor for each hypothesis on 
Frame of Discernment 
Loe factors applied to estimate basic belief assignments DETERMINE the hypotheses where rules have fired 
aa E A ALLOCATE mass to hypotheses on the Frame of Discernment 
Rules Fired Discount Origin Importance Discount where rules have fired. 
<= 1 0.25 Journal 1 DISCOUNT these masses according to their combined 
=2 0.5 White paper 0.6 discount factor obtained in Step 1 and discounting based on 
=3 0.75 Standards 0.8 (12) 


>= 4 1 Magazine 0.4 


Web source 0.2 ALLOCATE remaining mass to © 
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To estimate the bba based on evidence extracted from Source 1 
we apply the discounting factor 0.75 to the one hypothesis Tita- 
nium on the frame of discernment. Mass is only allocated to the 
Titanium hypothesis as no rules fired for either the Aluminum or 
Composite hypotheses in this instance. The remaining mass is then 
distributed over © therefore the bba based on Source 1 is esti- 
mated as {m(Ti)=0.75, m(@)=0.25). This same approach is 
applied to all the Sources used in this research. 


4.2.1.1. Maximal consistent subset algorithm. After all bbas are esti- 
mated the next step in the process is to construct the Maximal 
Consistent Subset (step 7). This subset consists of a set of bbas 
which have been deemed to be consistent with one another. This 
is an essential step in the evidential reasoning process as fusing 
inconsistent subsets can result in erroneous and inaccurate results. 
Algorithm 2 summarizes the steps involved in constructing the set 
of consistent subsets. Subsets which do not reach the required con- 
sistency to be members within the Maximal Consistent Subset are 
further discounted based upon rule importance. This is the second 
phase of discounting. Each rule within the knowledge base has 
been graded in terms of importance by an expert. Table 5 presents 
the discounting factors based on the importance allocated to differ- 
ent rules in the knowledge base. For example the rule: Aluminum 
& (Corrosion & (Fatigue & Strength)) = Aluminum has been rated 
as highly important by an expert as it addresses key requirements 
of a material namely resistance to corrosion, fatigue and high 
strength. Only bbas which do not reach the criteria to become 
members of the Maximal Consistent Subset are further discounted 
using these rules. 

To determine which bbas are considered consistent and there- 
fore members of the Maximal Consistent Subset we present an 
algorithm to construct a maximal consistent subset. To start, infor- 
mation content for each bba is calculated using the PIC approach. 
The PIC is used to depict the strength of a critical decision by a spe- 
cific probability distribution |38]. The bba which obtains the high- 
est PIC value will become the first member of the maximal 
consistent subset. If more than one bba obtains the highest PIC va- 
lue, we chose one arbitrarily. Next, using the Metric Distance we 
measure the similarity of remaining bbas to those in the maximal 
consistent subset. It is important to state that other similarity mea- 
sures can also be applied such as those mentioned in Section 3.4. A 
bba obtaining a similarity greater than the threshold 0.6 is permit- 
ted to join the maximal consistent subset. Furthermore, different 
thresholds can be selected, the threshold selected in this research 
was chosen by an expert. This process is repeated until there are 
either no remaining bbas or no bbas left which obtain a similarity 
value greater than or equal to the set threshold. 


Algorithm 2. Calculation of Maximal Consistent Subset 


FORALL bbas calculate information content using PIC 
approach based on (9) 

SELECT bba with highest information content, add to maximal 
consistent subset. If more than one bba have the same PIC 
value, choose one arbitrarily 

REPEAT 
FIND most similar bbas using distance measure (based on 
(10)) to those bbas in maximal consistent subset 
IF similarity value > threshold then join bba to maximal 
consistent subset 

UNTIL similarity values for all remaining bbas not in maximal 
consistent subset obtain value < threshold or no bbas 
remain 


4.2.1.2. Information Fusion. After discounting has been applied and 
the maximal consistent subsets defined, all bbas are fused together 
using techniques from DS and DSm theory (step 8). To fuse any 
quantity of estimated bbas a Java application has been developed 
by the authors. Using this application the DS and DSm theory 
and the Dempster’s rule of combination along with the PCR5 com- 
bination rule can be applied to fuse the bbas both in the maximal 
consistent subset and those further discounted bbas not in the 
maximal consistent subset. 


5. Use Case 


The selection of material(s) to construct a component a key de- 
sign decision in the Engineering. It is important to state that the 
framework proposed is applicable to other fields where important 
decisions have a critical affect on projects. Materials selection is a 
task normally carried out by design and materials engineers. The 
aim of materials selection is the identification of materials, which 
after appropriate manufacturing operations, will have the dimen- 
sions, shape and properties necessary for the product or compo- 
nent to demonstrate it meets its requirements. Properties may 
include physical properties, electrical properties, magnetic proper- 
ties, mechanical properties, chemical properties and manufactur- 
ing properties [39]. 

The final choice of material can be viewed as a design decision 
which is subject to uncertainty as it often not clear early in the life- 
cycle which properties or attributes are relevant to the design deci- 
sion or their level of importance. Uncertainty may also rise through 
the level of imprecision in attribute values where an attribute may 
take a value within a range of values [40]. 

In this study, a key issue for a design engineer is the choice of a 
particular material for example, aluminum, titanium or composite 
for the construction of a rib post component between the wing rib 
and spar. The rib post is an important element within an aircraft 
wing providing the structural join between the wing spar and 
internal rib. The aim of this Use Case is to demonstrate the appli- 
cation of our proposed framework to utilize information extraction 
and textual entailment techniques for the estimation of bbas along 
with evidence theory to fuse disparate sources to aid decision mak- 
ing in an Engineering domain. A colleague from QUB’s Aeronautical 
Engineering Department acted as a design engineer in our study. 
He formed a knowledge base and checked it manually for consis- 
tency. The knowledge base consisted of 64 propositions and 54 
rules. His construction of the knowledge base centered on consid- 
ering propositions and consequently rules that are particularly rel- 
evant to the problem. The information extraction and entailment 
steps were applied as described in the Methodology section to al- 
low for discovery of supported rules. DSmT has been selected to 
fuse together pieces of evidence using the PCR5 rule of combina- 
tion. This rule has been selected as it has been designed to cope 
with highly conflicting and uncertain information. However, other 
combination rules such as Dempsters Rule of Combination can also 
be applied within the framework. The Metric distance measure is 
applied to determine similarity/highlight potential conflict be- 
tween sources. This measure has been selected and applied in 
the Maximal Consistent Subset algorithm as it has been proved 
an effective principled approach when measuring distance be- 
tween bbas. Similarity is calculated to weight agreement between 
sources as it is known that conflicting and inconsistent data can be 
detrimental to the decision making process. Determining the Max- 
imum Consistent Subsets will aid in determining which sources 
Should be further discounted. By determining both the maximal 
consistent subsets and applying discounting factors to dis-similar 
sources we aim to improve the correctness of fusion results. Deci- 
sion making is based on pignistic probabilities where results are 
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presented using both DSmP and BetP transformation methods for 
comparative purposes. 


5.1. Sources 


Forty-nine evidence sources were drawn from a number of evi- 
dence sources related to the general issues of materials and aero- 
nautical design. These varied in terms of their origin and were 
extracted from web searches of: journals, white papers, magazines, 
web pages/blogs and International Aviation Standards, which we 
considered relevant to the design problem. The source materials 
are summarized in Table 6. 


5.2. Estimation of basic belief assignments 


Output knowledge from the information extraction/textual 
entailment steps is used to estimate the bbas of the diverse sources 
of information. The classifier selected for the textual entailment 
process is the C4.5 classifier as it obtained the highest accuracy 
among the four different classification methods. An excerpt of 
these outputs are presented in Table 3. It can be viewed from 
Table 3 that the output for each source contains knowledge con- 
cerning the hypothesis as a material selection, the number of rules 
fired and the origin of the source. Using this information we esti- 
mate bbas for each source. Highlighted in Table 4 are the different 
discounting factors applied to estimate the bbas based upon the 
number of rules fired and the source origin. The bbas are dis- 
counted using the classical Dempster discounting approach de- 
scribed in Section 4.2. This approach was applied to all the 
output sources from the entailment step to produce a total of 20 
bbas defined in Table 7 where {F1,...,E20} refer to the different 
evidence sources and Ti, Al, Comp, © show how belief is distributed 
for each bba. Only 20 of the 49 sources produced supporting evi- 
dence. This would be likely to have been higher in a real world sce- 
nario where an expert had constructed the knowledge base 
allowing for a more detailed coverage of appropriate propositions 
and rules, and all sources were evaluated in detail in terms of 
relevancy. 


5.3. Construction of maximal consistent subset 


It is known that conflict between evidence sources can have a 
detrimental impact upon the evidential reasoning process. To ad- 
dress this, an algorithm to construct the maximal consistent sub- 
sets amongst a group of subsets was outlined in Section 4.2. This 
algorithm was applied to the 20 bbas defined as {F1,...,E20}. PIC 
values were calculated for each bba in the first step. The bba E3 ob- 
tained the highest PIC value and became the first member of the 
maximal consistent subset. To determine the next member(s) of 
the Maximal Consistent Subset the Metric distance is applied to 
measure the similarity between the subsets in the Maximal Consis- 
tent Subset to non-members of the subset. A cut off threshold of 
0.6 was selected by the expert system designer and judged as an 
acceptable threshold similarity value. If a distance value obtained 
by measuring a subset to the Maximal Consistent Subsets was 


Table 6 
Document sources. 


Source type Number of sources 


Journals 11 
Magazines 2 
Standards 1 
Web sources (blogs, etc.) 30 
Whitepapers 5 


Table 7 

Estimated basic belief assignments. 
Evidence Ti Al Comp O 
E1 0.75 0 0 0.25 
E2 0 0 0.5 0.5 
E3 0 0 1 0 
E4 0.25 0 0.25 0.5 
E5 0 0.05 0 0.95 
E6 0.017 0.0417 0.0167 0.925 
E7 0 0 0.05 0.95 
E8 0 0.05 0 0.95 
E9 0 0 0.05 0.95 
E10 0 0 0.05 0.95 
E11 0 0 0.1 0.9 
E12 0 0 0.05 0.95 
E13 0 0 0.05 0.95 
E14 0.1 0 0 0.9 
E15 0.15 0 0 0.85 
E16 0 0 0.15 0.85 
E17 0 0.075 0.15 0.775 
E18 0 0.225 0.15 0.625 
E19 0.15 0 0.15 0.7 
E20 0.3 0 0.4 0.3 


greater than or equal to this threshold then this subset became a 
member of the Maximal Consistent Subset. This was repeated until 
the remaining subsets did not reach the threshold for membership 
of the Maximal Consistent Subset. The resulting Maximal Consistent 
Subset consisted of 6 members {E1, E3, E5, E7, E8, E20}. A second 
phase of discounting based on the rule importance described in Sec- 
tion 4 was applied to the remaining 14 subsets which did not reach 
the specified similarity threshold. The application of this rigorous 
approach provides the bbas which are used as input into the eviden- 
tial reasoning application to aid in the decision making process. 


5.4. Evidential reasoning 


In this Use Case, an engineer is tasked with selecting a material 
to construct a rib post from the set: Aluminum (Al), Titanium (Ti) 
or Composite (Comp). The following Frame of Discernment 
© = {Al, Ti, Comp}, in accordance with Shafer’s model is used to 
model the fusion problem. A Java application has been developed 
by the authors implementing the DSm Theory to fuse the diverse 
bbas estimated in the steps above. In total, the PCR5 Rule of Com- 
bination is applied to fuse all 20 bbas. To highlight the impact the 
construction of Maximal Consistent Subsets and therefore the 
reduction of conflict and uncertainty between information sources 
has upon the evidential reasoning process we present results 
where (1) all subsets are viewed as equal and are fused, (2) only 
the Maximal Consistent Subsets are fused and (3) the Maximal 
Consistent Subsets are fused with the discounted subsets. 


5.4.1. Fusion of all sources when no maximal consistent subset 
constructed 

In this experiment, no pre-processing has been performed to 
determine the Maximal Consistent Subset (i.e. set of consistent 
sources). Instead, all 20 bbas are assumed to be equal with no addi- 
tional discounting applied to conflicting bbas. The results for this 
scenario is presented in Table 8 where pignistic probabilities for 
each hypothesis are presented using both the generalized BetP 
and DSmP approaches. Interestingly, high pignistic probabilities 
are obtained for the Comp hypothesis by both generalized pignistic 
transformation approaches. These are followed by Ti and finally Al. 


5.4.2. Fusion of maximal consistent subsets only 

In this scenario the results presented have been obtained 
from fusing only those 6 bbas which are members of the Maxi- 
mal Consistent Subset. Using the algorithmic approach described 
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in the Methodology section only 6 bbas from a total of 20 sub- 
sets were determined to be similar and therefore consistent. By 
fusing consistent subsets the aim is to improve the quality of the 
pignistic probabilities obtained in the fusion process for decision 
making purposes. Table 9 highlights the results obtained using 
this approach. In comparison to the results obtained when no 
Maximal Consistent Subsets were obtained one can see that 
Comp obtained the highest probability, however, with less 
weight allocated to this hypothesis and more to the Ti 
hypothesis. 


5.4.3. Fusion of maximal consistent subsets and discounted subsets 

In this scenario we present results obtained when the Maximal 
Consistent Subsets were fused with the remaining 14 bbas. In this 
case however, the 14 bbas which did not reach the consistency cri- 
teria of the Maximal Consistent Subset are further discounted. This 
discounting is important as these diverse sources could be possibly 
conflicting and inconsistent with the preprocessed Maximal 
Consistent Subset. The discounting reduces the potential for con- 
flict allowing additional knowledge to be applied in the decision 
making process. The aim of this is to obtain realistic pignistic prob- 
abilities for the different hypotheses which are not detrimentally 
affected by potential conflict in the process. It can be viewed from 
the results in Table 10 that similar to the results in Table 9 and 
Table 8 the Comp hypothesis obtains the highest pignistic probabil- 
ities. However, these probabilities are less than the values obtained 
when no discounting or Maximal Consistent Subsets were calcu- 
lated and slightly more than only using the 6 bbas in the Maximal 
Consistent Subset. 

It can be viewed for all scenarios that the material Al obtained 
low pignistic probabilities. This is because out of the 20 bbas only 
5 bbas contained mass allocated to the Al hypothesis. Furthermore 
the mass allocated was minimal. This is due to the information ex- 
tracted from the original evidence sources provided little support 
for this material. There is the possibility that if other evidence 
sources were utilized this may not be the case. In comparison 15 
estimated bbas contained mass assignment for the Comp 
hypothesis. 


6. Discussion 


This Use Case aimed to illustrate a exploratory application of 
the proposed novel conceptual framework integrating the areas 
of information extraction and evidential reasoning. This is a first 
attempt at being able to collate and combine evidence from a num- 
ber of natural language based qualitative sources related to a mate- 
rial selection problem. It has shown encouraging prelimary results, 
admitttedly based on a number of ad hoc settings. Certain 
improvements are needed in the different steps of this framework 
to increase its applicability and allow for a proper evaluation. A 
property is based on a binary choice between its presence or ab- 
sence, however for real world problems a property often takes a 
categorical value to allow for varying degrees (e.g. the property 
elasticity could have the categories {low, medium, high}), so we 
would need to consider how best a knowledge based should be 
constructed to allow for this. Also our Knowledge base was con- 
structed manually and its consistency checked manually. This 


Table 8 

Fusion of all subsets where all subsets are assumed equal. 
Hypothesis Generalized BetP DSmP,-o 
Comp 0.783 0.783 
Ti 0.204 0.204 
Al 0.014 0.014 


Table 9 

Fusion maximal consistent subsets only. 
Hypothesis BetP DSmP 
Comp 0.684 0.684 
Ti 0.315 0.315 
Al 0.000 0.000 

Table 10 

Fusion of maximal consistent subsets and discounted subsets. 
Hypothesis BetP DSmP 
Ti 0.259 0.259 
Comp 0.733 0.733 
Al 0.008 0.008 


served the purpose of the Use Case, however it would be better 
to automate its construction and further refinement of the process 
is required to make it practically applicable. For example a method 
for automated proposition discovery would utilize NLP based pro- 
cesses relying on specialized lexicons and template matching 
based on an analysis of the sources. Rules would still need to be 
specified by an expert but model checking is needed to check for 
a consistent set of rules as in [41]. In the estimation of the bbas, 
discounting is performed using knowledge obtained from the 
information extraction process. The discounting was based on 
rules fired and source origin with an additional discounting based 
on rule importance. The discounting was not an automated process 
and was guided by the design engineer. Other more complex crite- 
ria for discounting could be applied by incorporating other dis- 
counting criteria, and appropriate methods developed to guide 
this discounting process. Also a more refined process for judging 
the relevancy of sources is needed. For instance, all web sources 
had the same discount applied, however some sources may be 
more reliable than others. In addition, the set of retrived docu- 
ments may omit sources of information which is particularly appo- 
site to an given problem. But in the absence of existing available 
data specific to the given problem, we were constrained to provid- 
ing our own. The construction of Maximal Consistent Subsets is 
beneficial to the evidential reasoning stage as inconsistencies and 
conflict are identified and addressed before fusion. This is impor- 
tant as these inconsistencies and uncertainty can have an adverse 
effect on the decision making process. Further work could be per- 
formed on the given algorithm to construct different Maximal Con- 
sistent Subsets, for instance, different distance measures could be 
applied to measure the similarity between bbas. Finally in the evi- 
dential reasoning process the DSm Theory was employed to handle 
uncertainty between evidence sources when fusing information 
using the PCR5 rule of combination. This stage is not limited to this 
one approach and it would be interesting to apply other combina- 
tion rules. In summary, it would be more appropriate especially for 
the purposes of evaluation that the sources were available as part 
of an actual industry based design, where each source is more 
tightly coupled to the identified material selection problem. The 
sources could be based on a number of different company engi- 
neers’ reports concerning the material selection issue. Other 
sources could be supporting documents that the engineers con- 
sider valuable. In the latter case, each engineer writes using the 
same language style and it would be possible to confer with the 
engineers the outcomes of the evaluation. The engineers could also 
assist in the process by which an automatically generated knowl- 
edge base is generated. Discounting mechanisms may still be 
applicable as each report/document may not receive the same 
weighting. 
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7. Conclusion 


In this paper, a novel conceptual framework integrating the di- 
verse areas of information extraction, textual entailment and evi- 
dential reasoning is proposed to solve decision making issues 
under the constraints of uncertain and inconsistent information. 
An algorithm to determine a set of maximal consistent subsets is 
presented based on the Metric distance of evidences. To estimate 
basic belief assignments textual analysis was performed with the 
assistance of a manually constructed knowledge base. A Use Case 
based in the Aerospace domain is provided to illustrate the effec- 
tiveness of the proposed approach. This Use Case highlighted the 
importance of applying discounting factors based on knowledge 
extracted using textual analysis approaches and measuring consis- 
tency between evidential sources before making decisions. Fur- 
thermore, this framework could be applied to other problem 
areas involving the selection of materials based on qualiitative 
descriptions linking properties to recommendations of materials, 
with appropriate enhancements to make it an automated process. 
Our framework is only the first step in realizing a more seamless 
realization of extracting knowledge from textual documents and 
interpreting this potentially conflicting knowledge using evidential 
reasoning. We have demonstrated an initial capability but we rec- 
ognize that the knowledge base as specified is key, and further 
NLP/ AI techniques are needed to automate its construction for 
any working system, as the range and diversity of sources becomes 
greater, the construction of an extensive and comprehensive 
knowledge base is laborious if done manually. 
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